LTCC Design of Experiments

Week 4: Interence and Multi-objective Experimentation

Vasiliki Koutra

Introduction

Welcome

The main focus is on two areas - experimental design under treatment interference and multi-objective experimental design - that are active research areas.

Figure attribution: Subset of the Facebook network from the Stanford Data Collection.

Common DoE assumptions

To a greater or lesser degree, the ideas and principles from Weeks 1-3 are dependent on a number of assumptions holding:

  • that the expected response from a given unit only depends on the treatment applied to that unit, and not on the treatments applied to any other units;

  • for factorial, response surface and optimal designs, that a reasonable approximating statistical model can be specified;

  • for optimal designs, that the aim of the experiment can be neatly encapsulated in a single mathematical expression or objective function.

Relaxing the assumptions

This week, we will focus on approaches that allow us to relax one or the other of these assumptions:

  • in Part 1 (Experiments with interference), we will introduce methods for designing and analysing experiments when treatment interference is anticipatedl

  • in Part 2 (Multi-objective experimentation), we will introduce multi-objective (compound) design optimality criteria that address multiple experimental aims simultaneously.

Experiments with interference

SUTVA

Standard model for analysing a designed experiment,

y_{i} = \mu + \tau_{r(i)} + \varepsilon_i\,, \tag{1}

with the aim of estimating treatment differences \tau_j - \tau_k. Here, r(i) \in \{1,\ldots,t\} indicates which treatment was allocated to the ith unit (i = 1,\ldots,n).

This model makes the stable unit treatment value assumption (SUTVA), which states that the response from any particular unit is unaffected by the assignment of treatments to other units (Cox 1958, sec. 2.4).

Example experiments

  1. A clinical trial split into (time) periods. Within a period, each patient will be assigned one of the treatments. Across the whole experiment, each patient will be assigned all the treatments.

  2. An agricultural experiment with the field available for the experiment is split into different plots, with one treatment assigned to each plot.

  3. A marketing experiment to assess the effectiveness of different adverts with each user on (a subset of) the platform will be shown one of a number of different adverts.

Treatment interference

What do all three of these experiments have in common? Possible treatment interference (or treatment carryover or spillover).

  1. The clinical response obtained from the application of a treatment in a given period may also be affected by the treatment applied in the preceding period.

  2. The response, e.g., crop yield, from a given plot may be affected by the variety of wheat applied to neighbouring plots, due to shading or attractiveness to pests.

  3. The response from a particular social media user to an advert may be influenced by the adverts seen by their connections or friends.

Ignoring substantial treatment interference, as in model (1), can lead to biased estimates of differences between the direct treatment effects t_r.

Mitigate of treatment interference may be possible, e.g., by adding “wash-out” periods in the clinical trial or “guard plots” in the agricultural experiment.

But in many cases this may not be possible (or ethical) or there may be interest in the indirect effect of each treatment; for example, the viral effect of the adverts in the marketing experiment.

Hence, it is of interest to study designs and models which account for treatment interference.

Cross-over trials

In a cross-over trial, each subject is assigned a sequence of treatments across different time periods. Interest is in comparing individual treatments, not sequences, and the experimental units are the periods within each subject.

Cross-over trials are common in studies of chronic conditions, where repeated treatment is required.

Advantages include

  • each subject acts as their own control;
  • and treatment comparisons can be made “within subject”.

See Jones and Kenwood (2015) and Bose and Dey (2015).

Order and carry-over effects

However, this feature of within subject comparison can also bring disadvantages. Principally,

  • the order in which the treatments are applied may impact the outcome;
  • there may be treatment interference between periods of the design (typically called carry-over in a cross-over trial). That is, the outcome from period i may depend on the treatment applied in period i-1 (first-order interference) or even from earlier periods.

2x2 cross-over trial

The simplest form of cross-over design concerns t=2 treatments and p=2 periods.

Table 1: Sequences for a 2x2 cross-over trial.
Sequence Period 1 Period 2
1 A B
2 B A

There are clearly only two possible sequences and each subject in the trial is randomised to one of the two. A wash-out period is may be inserted between the two treatment periods.

Example

Investigate the efficacy of an inhaled drug (A), compared to a control (B), for patients suffering from chronic obstructive pulmonary disease (COPD). The response was the mean expiratory flow rate (PEFR) based on readings recorded each morning by the subjects.

Table 2: Mean morning PEFR (L/min) from a 2\times 2 cross-over trial (adapted from Jones and Kenwood 2015, ch. 2).
Sequence AB
Sequence BA
Subject Period 1 Period 2 Subject Period 1 Period 2
1 121.905 116.667 28 138.333 138.571
2 218.500 200.500 29 225.000 256.250
3 235.000 217.143 30 392.857 381.429
4 250.000 196.429 31 190.000 233.333
5 186.190 185.500 32 191.429 228.000
6 231.563 221.842 33 226.190 267.143
7 443.250 420.500 34 201.905 193.500
8 198.421 207.692 35 134.286 128.947
9 270.500 213.158 36 238.000 248.500
10 360.476 384.000 37 159.500 140.000
11 229.750 188.250 38 232.750 276.563
12 159.091 221.905 39 172.308 170.000
13 255.882 253.571 40 266.000 305.000
14 279.048 267.619 41 171.333 186.333
15 160.556 163.000 42 194.737 191.429
16 172.105 182.381 43 200.000 222.619
17 267.000 313.000 44 146.667 183.810
18 230.750 211.111 45 208.000 241.667
19 271.190 257.619 46 208.750 218.810
20 276.250 222.105 47 271.429 225.000
21 398.750 404.000 48 143.810 188.500
22 67.778 70.278 49 104.444 135.238
23 195.000 223.158 50 145.238 152.857
24 325.000 306.667 51 215.385 240.476
25 368.077 362.500 52 306.000 288.333
26 228.947 227.895 53 160.526 150.476
27 236.667 220.000 54 353.810 369.048

Linear model for cross-over trials

The traditional linear model used with cross-over experiments contains terms corresponding to subjects, period, treatment and interference:

\begin{split} y_{ij} = \mu + \alpha_i + \beta_j + \tau_{r(i,j)} + \rho_{r(i-1,j)} + \varepsilon_{ij}\,,\\ i = 1,\ldots,p;\, j = 1,\ldots, n\,, \end{split} \tag{2}

  • r(i,j) \in \{1,\ldots,t\} denotes the treatment allocated to the jth subject in the ith period,
  • \alpha_i is the ith period effect,
  • \beta_j is the jth subject effect,
  • \rho_{r(i-1,j)} is the interference (indirect, carry-over) effect, with \rho_{r(0,j)}=0.

Analysis of variance

Table 3: Analysis of variance from PEFR cross-over trial
df Sum Sq. Mean Sq. F-value P-value
sequence (interference) 1 21834.440 21834.440 2.013 0.162
between subject residual 52 563973.220 10845.639
period 1 313.444 313.444 0.936 0.338
treatment 1 2723.059 2723.059 8.128 0.006
within subject residual 52 17421.912 335.037

Notes:

  • the between sequence sums of squares tests \rho_1 = \rho_2;
  • the correct demominator sums of squares is between subjects;
  • we need the assumption of no interference to test for direct treatment effects (as we cannot have complete balance as a treatment does not follow itself).

Larger cross-over designs

A balanced design has each treatment occuring the same number of times in each period, and each treatment following every other treatment the same number of times, with no treatment following itself.

  • A 2\times 2 trial is balanced.
  • Every pair of estimated direct treatment differences, \hat{\tau}_k = \hat{\tau}_l has the same variance.

A strongly balanced design has every treatment followed by every other treatment, including itself.

  • Direct and indirect effects are estimated independently, simplifying the analysis and interpretation, and lowering the variance of estimators of the indirect effects.

Balanced 4x4 Latin square

Table 4: Balanced Latin square design for four treatments.
Sequence Period 1 Period 2 Period 3 Period 4
1 A D B C
2 B A C D
3 C B D A
4 D C A B

Strongly balance extra-period design

Table 5: Strongly balanced extra-period design for four treatments.
Sequence Period 1 Period 2 Period 3 Period 4 Period 5
1 A D B C C
2 B A C D D
3 C B D A A
4 D C A B B

Interference in other trials

Interference can also occur in other trials, including parallel group trials without repeated treatment applications1.

  • E.g., assessing public health interventions.

One mitigation strategy is use of a cluster randomised trial (CRT).

  • All subjects in a cluster or group recieve the same treatment.
  • Typically clusters will be designed to limit possible treatment interference between clusters.

In other trials, the indirect effect of each treatment may be of interest in itself, e.g. vaccine trials

  • Methods are required for the design and modelling of experiments to estimate indirect and total effects (Hudgens and Halloran 2008).

Networked experiments

Experiments from outside the clinical arena can also violate SUTVA.

For example, online controlled experiments on websites and social media platforms (Larsen et al. 2024).

  • Connections between users can lead to treatment interference.
  • There could be distinct communities within the network, with more similar responses expected from users in the same community.

Figure 1: Subset of the Facebook network from the Stanford Data Collection, Colours inducated 24 distinct blocks, or communities, of users.

Other examples

Table 6: Examples of networked experiments.
Field of study Intervention Connections Response
Marketing Advertisement Virtual friendships Product awareness
Agriculture Pesticide Geographic proximity Crop yield
Health Infection control information Patient contacts in hospital Disease incidence
Politics Direct mailing Voter interactions Voting behaviour
Education Incentivised food choices Social links Snack choice
Ecology Reward-based intervention Animal interactions Reaction speed
Law enforcement Surveillance Geographical proximity Crime rate

Graphs of designs

Suppose that n units are formed into a network with connections representing possible treatment interference.

This network can be represented as a graph \mathcal{G} = (\mathcal{V}, \mathcal{E}),

  • vertex set \mathcal{V} represents units;
  • edge set \mathcal{E}, of size l, represents the connections.

Connections can be succinctly represented via the adjacency matrix.

  • A = [A]_{jh}, an n\times n matrix with A_{jh} = \in [0, 1]
  • Undirected graphs have A_{jh} = A_{hj}.
  • Lack of an edge between nodes j and h in \mathcal{E} leads to A_{jh} = A_{hj} = 0.
Table 7: Example adjacency matrix for the graph \mathcal{G} in Figure 2 with |\mathcal{V}| = 5 vertices and |\mathcal{E}| = 6 undirected edges.
A B C D E
A 0 1 0 1 1
B 1 0 1 0 0
C 0 1 0 1 0
D 1 0 1 0 1
E 1 0 0 1 0

For some applications, necessary blocking factors may be obvious or based on covariates external to the graph, e.g., age or sex.

For others, it may be necessary or desirable to base the blocks on the graph structure itself, e.g., using spectral clustering (Koutra, Gilmour, and Parker 2021).

Figure 2: Example graph \mathcal{G} for the adajcency matrix in Table 7 with |\mathcal{V}| = 5 vertices and |\mathcal{E}| = 6 undirected edges. Colours indicate an examplar blocking into two groups.

Linear network model

The adjacency matrix can be used to incorporate indirect treatment effects into a model for the experiment Koutra, Gilmour, and Parker (2021):

\begin{split} y_{ij} = \mu + \beta_i + \tau_{r(i,j)} + \sum_{g=1}^{b}\sum_{h=1}^{n_{g}} A_{\left\{ij,gh\right\}}\gamma_{r(g,h)} +\varepsilon_{ij}\,, \\ \quad i=1,\ldots,b\,,\,j = 1,\ldots, n_i\,. \end{split} \tag{3}

  • \beta_i is the ith block effect;
  • \tau_k and \gamma_k are the direct and indirect treatment effects

Example

For the graph in Figure Figure 2, with adjacency matrix in Table 7, the response from node A, the first unit in block 1, would be modelled as:

y_{11} = \mu + \beta_1 + \tau_{r(1,1)} + \gamma_{r(1,2)} + \gamma_{r(1,3)} + \gamma_{r(2,1)} + \varepsilon_{11}\,,

with the indirect treatment effects resulting from the edges between node A and nodes D and E (in block 1) and node B (in block 2). The linear network effects model can be estimated using least squares or maximum likelihood.

Optimality criteria

Two possible aims from the experiment are

  1. estimation of pairwise differences between direct treatment effects, or
  2. estimate of pairwise differences between indirect treatment effects, if primary interest is in the viral effects of a treatment.

In either case, design selection will be based on model (3) with direct and indirect treatment effects being mutually adjusted.

A-optimality

For efficient estimation of direct treatment differences, designs are sought that minimise the average variance of the pairwise differences:

\phi_{\tau}=\frac{2}{t(t-1)} \sum_{s=1}^{t-1}\sum_{s'=s+1}^t \text{var}(\widehat{\tau_s-\tau_{s'}})\,. \tag{4}

Similarly, we can define a criterion for efficient estimation of indirect treatment differences, that minimises

\phi_{\gamma}=\frac{2}{t(t-1)} \sum_{s=1}^{t-1}\sum_{s'=s+1}^t \text{var}(\widehat{\gamma_s-\gamma_{s'}})\,. \tag{5}

Designs can be found via application of standard optimisation algorithms, such as point exchange (Cook and Nachtsheim 1980).

Example - co-authorship network

Links between academics within a university research group (Koutra, Gilmour, and Parker 2021).

  • 22 nodes, split into three blocks, and 27 edges.
  • Interest lies in estimating the effects of two treatments.

Figure 3: Block designs for a co-authorship network with colours indicating blocks (identified via spectral clustering) and plotting symbol indicating allocation to treatment 1 or 2. Left: optimal design for estimation of direct effects. Right: optimal design for estimation of indirect effects.

Notes

Estimating direct effects

  • The \phi_{\tau}-optimal design is balanced, with equal replication of each treatment.
  • Treatment allocation is also balanced within each block.
  • Nodes allocated to each treatment have similar first- and second-order degrees

Estimating indirect effects

  • Treatment allocation in the \phi_{\gamma}-optimal design is highly dependent on the network.
  • Highly connected nodes tend to receive a different treatment from their surrounding, less connected, nodes.

Comparisons to other designs

Quantitative comparisons can be made between the block network designs (BNDs) from Figure 3 and the optimal designs that would result from models that

  • ignore blocks and network structure (CRD: completely randomised design);
  • ignore network structure (RBD: randomised block design);
  • ignore blocks (LND: linear network design).

Efficiencies for estimating direct effects

Efficiencies are calculated within row.

Table 8: Efficiencies for estimating the direct treatment effects for designs with and without blocking and/or indirect effects under different model assumptions.
Designs
Model CRD RBD LND BND
CRM 1.00 1.00 1.00 1
RBM 0.89 1.00 0.68 1
LNM 0.86 0.83 1.00 1
BNM 0.73 0.81 0.50 1

Notes

Two features stand out from Table 8.

  1. The importance of including blocks in the optimal design, if they are present in the model. For example, the LND is only 50% efficient if blocks are added to the model. This is because balance within blocks is not achieved by the LND.
  2. The substantial loss in efficiency if network effects are excluded; e.g., the CRD and RBD lose around ~15-25% efficiency compared to the LND/BND.

Efficiencies for estimating indirect effects

The loss of efficiency for designs that ignore network structure is now large (>80%).

Table 9: Efficiencies for estimating the indirect treatment effects for designs with and without blocking under different model assumptions.
Designs
Model CRD RBD LND BND
LNM 0.16 0.12 1.00 0.64
BNM 0.16 0.16 0.39 1.00

Case study - agricultural experiment

The impact of neighbouring plots in field trials has been widely considered, including through study of indirect treatment effects (Besag and Kempton 1986).

Figure 4: Example field layout, as used at Rothamsted

A typical layout of a field trial is shown in Figure 4, clearly showing the proximity of neighbouring plots.

Row-column experiment

Experiments conducted at Rothamsted to study the differences in natural cereal aphid colonization (Koutra et al. 2023).

  • 21 different wheat varieties.
  • Units arranged in a 14\times 6 grid of 1m x 1m plots.
  • There are sufficient plots for each treatment to be replicated four times.
  • Data from 2016 experiment.

Layout and 2016 experiment

Figure 5: Plot layout for the agricultural example with treatment allocation from the 2016 design. Numbers indicated treatments allocated to each plot.

Treatment interference

Treatment interference was thought possible due to the differing levels of susceptibility of different varieties and the strong possibility of aphids moving from plot to plot.

Differing structures governing this interference were considered, represented as graphs.

Figure 6: Network and optimal design for the wheat field trial. Numbers indicated treatments allocated to each plot.

Statistical model

In addition to direct and indirect treatment effects, the analysis of the experiment needed to account for the spatial structure through the inclusion of blocking factors and row-column effects.

\begin{aligned} y_j &=\mu+\tau_{r(j)}+ R_i+C_k+(RC)_{ik}+r_{ig}+c_{kh} \\ & +\left(rC\right)_{igk}+\left(Rc\right)_{ikh} +\sum_{j'} A_{jj'} \gamma_{r(j')} + \varepsilon_{j}\,, \end{aligned} \tag{6}

with R, C and RC representing the effects of super-rows and super-columns, and their interaction (super-blocks). Effects r and c are of rows and columns nested inside super-blocks.

Analysis of variance from 2016 experiment

Table 10: Analysis with network effects for the 2016 wheat experiment.
Sum Sq Mean Sq NumDF DenDF F-value p-value
Comparison 1
Indirect effect 32.58 1.63 20.00 31.76 3.24 0.0015
Direct effect 19.34 0.97 20.00 35.48 1.92 0.0437
Comparison 2
Direct effect 32.20 1.61 20.00 35.85 3.20 0.0012
Indirect effect 20.41 1.02 20.00 32.09 2.03 0.0361

Optimal design for estimating direct effects

Figure 7: Network and optimal design for the wheat field trial. Numbers indicated treatments allocated to each plot.

Notes and comparisons

  • There is a good spatial spread of treatments;
  • but it is also quite common for pairs of connected units to share a treatment.

Both these features have been observed in previous row-column and network designs; see Freeman (1979), Parker, Gilmour, and Schormans (2016) and Koutra, Gilmour, and Parker (2021).

A comparison to designs found under different models shows a big loss in efficiency. The efficiency of the 2016 design (a resolvable row-column design) was 0.5.

Table 11: Efficiencies of optimal designs for various submodels of (6) when evaluated under the full model.
Designs
CRD RBD RCD BRCD LND BND RCND BRCND
Efficiency 0.4 0.43 0.46 0.51 0.46 0.5 0.72 1

Multi-objective designs

Model uncertainty

Various authors have written summary “checklists” for planning experiments, with items similar to “define the objectives” and “specify the model” (e.g., Dean, Voss, and Draguljić 2017). In general, such lists recognise

  1. the conflicting nature of many of the former;
  2. the a priori uncertainty in the latter.

We focus on using multi-objective optimal designs (Egorova and Gilmour 2023) to address uncertainty in the assumed by model by combing individual criteria for

  • inference for an assumed model;
  • the ability to identify model lack-of-fit;
  • minimum mean squared error, including bias from model misspecification.

Response surface designs

We will discuss such methods in the context of response surface models of the form

\begin{split} Y_i & = \beta_0 + \sum_{j = 1} ^ {p} \beta_j x_{ij} + \varepsilon_{i} \\ & = \beta_0 + \boldsymbol{x}_{i1}^{\mathrm{T}}\boldsymbol{\beta}_1 + \varepsilon_{i}\,, \end{split} \tag{7}

  • \beta_0 and \boldsymbol{\beta}_1^{\mathrm{T}} = (\beta_1, \ldots, \beta_{p}) are unknown parameters.
  • \boldsymbol{x}_{i1}^{\mathrm{T}} = (x_{i1}, \ldots, x_{ip}) holds values of the p predictors for the ith run.

Standard optimality criteria

The most common design selection criteria aim at estimation for model (7) with \sigma^2 assumed known.

D-optimality \phi_{D}(\mathcal{D}) = \left|\left[X_1^{\mathrm{T}}\left(I_n - \frac{1}{n}J_n\right)X_1\right]^{-1}\right|\,.\\ L-optimality \phi_L(\mathcal{D}) = \text{tr}\left\{L^{\mathrm{T}}\left(X_1^{\mathrm{T}}(I_n - \frac{1}{n}J_n)X_1\right)^{-1}L\right\}\,.

Here, we treat the intercept \beta_0 as a nuisance parameter.

Both criteria assume model (7) is correctly specified.

Primary and potential terms

Box and Draper (1959) introduced the idea of discrepancy between the assumed response surface model and an encompassing “true” model:

\begin{split} Y_i & = \beta_0 + \sum_{j = 1} ^ {p} \beta_j x_{ij} + \sum_{j = p+1} ^ {p+q} \beta_j x_{ij} +\varepsilon_i \\ & = \beta_0 + \boldsymbol{x}_{i1}^{\mathrm{T}}\boldsymbol{\beta}_1 + \boldsymbol{x}_{i2}^{\mathrm{T}}\boldsymbol{\beta}_2 + \varepsilon_i\,, \end{split} \tag{8}

where \boldsymbol{x}_{i2}^{\mathrm{T}} = (x_{i(p+1)}, \ldots, x_{i(p+q)}) holds the additional q polynomial terms, with associated parameters \boldsymbol{\beta}_2^{\mathrm{T}} = (\beta_{p+1}, \ldots, \beta_{p+q}).

DuMouchel and Jones (1994) labelled the polynomial terms in the assumed model as primary and the additional terms in the encompassing model as potential.

Mean squared error

One desirable aim is to be able to estimate \boldsymbol{\beta}_1 from model (7) protected from contamination from the potential terms.

Define the MSE matrix for \hat{\boldsymbol{\beta}} (Montepiedra and Fedorov 1997):

\begin{split} \text{MSE}\left(\hat{\boldsymbol{\beta}}_1\right)& = \mathtt{E}_{\boldsymbol{Y}}[(\hat{\boldsymbol{\beta}}_1 -\boldsymbol{\beta}_1)(\hat{\boldsymbol{\beta}}_1 - \boldsymbol{\beta})_1^\top]\\ & = \sigma^2[X_1^{\mathrm{T}} (I_n - \frac{1}{n}J_n) X_1]^{-1} + A_1\boldsymbol{\beta}_2\boldsymbol{\beta}_2^{\mathrm{T}} A_1^{\mathrm{T}}\,, \end{split} \tag{9}

where

A_1 = \left[X_1^{\mathrm{T}} \left(I_n - \frac{1}{n}J_n\right) X_1\right]^{-1}X_1^{\mathrm{T}} \left(I_n - \frac{1}{n}J_n\right)X_2

is the p\times q alias matrix between the primary and potential terms (excluding the intercept).

MSE(L) optimality

An analogy of variance-based alphabetic criteria is to consider functionals of this matrix.

\begin{split} \phi_{MSE(L)}(\mathcal{D}) & = E\left\{\text{trace}\left[\text{MSE}\left(\hat{\boldsymbol{\beta}}_1\right)\right]\right\} \\ & = \text{trace}\left\{E\left[\text{MSE}\left(\hat{\boldsymbol{\beta}}_1\right)\right]\right\} \\ & = \text{trace}\left[\sigma^2M^{-1} + E\left(A_1\boldsymbol{\beta}_2\boldsymbol{\beta}_2^\top A_1^\top\right)\right] \\ & = \sigma^2\text{trace}\left[M^{-1} + \tau^2 A_1^\top A_1\right]\,. \end{split} The expectation is taken with respect to a normal prior distribution for \boldsymbol{\beta}_2\sim \mathcal{N}\left(\boldsymbol{0}_q, \sigma^2\tau^2I_q\right) for \tau^2>0.

Pure error criteria

When uncertainty about the assumed model is being acknowledged, it is important that sufficient pure error degrees of freedom exist in the design to provide an unbiased estimator for \sigma^2.

Gilmour and Trinca (2012) suggested a class of criteria that explicitly incorporate the F-distribution quantiles on which parameter confidence regions depend.

DP-optimality \phi_{(DP)_S}(\mathcal{D}) = F_{p,d;1-\alpha}^p\phi_{D}(\mathcal{D})\,.

LP-optimality \phi_{LP}(\mathcal{D}) = F_{1,d;1-\alpha}\phi_L(\mathcal{D})\,.

where d = n-t is the number of replicated treatments in the experiment, \alpha is a pre-chosen significance level and F_{df1, df2; 1-\alpha} is the quantile of an F-distribution with df1 and df2 degrees of freedom such that the probability of being less than or equal to this quantile is 1-\alpha.

Model sensitivity

The ability of the design to make inference about the potential terms, and hence detect any lack of fit in the direction of model (8) can be quantified via functionals of R + \frac{1}{\tau^2}I_q, which is proportional to the posterior variance for \boldsymbol{\beta}_2.

LoF-DP-optimality \phi_{LoF-DP}(\mathcal{D}) = F^q_{q, d; 1-\alpha_{L}} \left|R + \frac{1}{\tau^2}I_q\right|^{-1}\,. LoF-LP-optimality \phi_{LoF-LP}(\mathcal{D}) = F_{1, d; 1-\alpha_{L}} \text{tr}\left\{L^\top\left(R + \frac{1}{\tau^2}I_q\right)^{-1}L\right\}\,.

Both criteria target designs with matrices X_1 and X_2 being (near) orthogonal to each other, which will also maximise the power of the lack-of-fit test for the potential terms.

MOODE

Multi-objective optimal design of experiments can be achieved via a compound criterion objective function constructed via a weighted product of individual objective functions.

Egorova and Gilmour (2023) defined a trace-based compound criteria as

\phi_{trace}(\mathcal{D}) = \phi_{LP}(\mathcal{D})^{\kappa_{LP}}\times \phi_{LoF-LP}(\mathcal{D})^{\kappa_{LoF-LP}} \times \phi_{MSE(L)}(\mathcal{D})^{\kappa_{MSE(L)}}\,,

with all weights \kappa \ge 0 and \kappa_{LP} + \kappa_{LoF-LP} + \kappa_{MSE(L)} = 1.

These compound criteria, along with their componenet criteria, are implemented in the R package MOODE (Koutra et al. 2024), available on CRAN.

Example

The 12-run Plackett-Burman design is perhaps the most widely used, and studied, non-regular fractional factorial design.

  • Orthogonal estimation of the main effects of up to 11 two-level factors
  • Main effect estimator for each factor is biased by all two-factor interactions not involving that factor.

We find alternative two-level designs for k=3,\ldots,9 factors using the trace-based compound criterion under five different sets of criteria weights.

The primary model consists of all k main effects, with the potential model also including all two-factor interactions.

The MOODE package can be used to find designs under these models and criteria.

Weights

Table 12: Individual criteria weights for five different compund criteria.
\kappa_1 \kappa_2 \kappa_2
0.33 0.33 0.33
0.25 0.25 0.5
1 0 0
0 1 0
0 0 1

Results

Figure 8: Efficiencies for MOODE and other optimal designs under different criteria.

Results

Table 13: Compound optimal design (left) for k=4 two-level factors with n=12 runs and \kappa_{LP} = \kappa_{LoF-LP} = \kappa_{MSE(L)} = 1/3, along with the corresponding LP-optimal (middle) and MSE(L)-optimal (right) designs.
Trt label x_{1} x_{2} x_{3} x_{4} Trt label x_{1} x_{2} x_{3} x_{4} Trt label x_{1} x_{2} x_{3} x_{4}
1 -1 -1 -1 -1 2 -1 -1 -1 1 1 -1 -1 -1 -1
4 -1 -1 1 1 2 -1 -1 -1 1 2 -1 -1 -1 1
6 -1 1 -1 1 2 -1 -1 -1 1 3 -1 -1 1 -1
6 -1 1 -1 1 7 -1 1 1 -1 5 -1 1 -1 -1
7 -1 1 1 -1 7 -1 1 1 -1 6 -1 1 -1 1
7 -1 1 1 -1 9 1 -1 -1 -1 8 -1 1 1 1
10 1 -1 -1 1 9 1 -1 -1 -1 9 1 -1 -1 -1
10 1 -1 -1 1 9 1 -1 -1 -1 11 1 -1 1 -1
11 1 -1 1 -1 12 1 -1 1 1 12 1 -1 1 1
11 1 -1 1 -1 12 1 -1 1 1 14 1 1 -1 1
13 1 1 -1 -1 14 1 1 -1 1 15 1 1 1 -1
16 1 1 1 1 14 1 1 -1 1 16 1 1 1 1

Notes 1

LP-efficiency:

  • L-optimal designs lack replication, few pure error degrees of freedom and zero efficiency under LP-criterion.
  • Similar for MSE(L)-optimal designs for larger k.
  • Compound optimal designs at least 50% LP-efficient, usually more.

MSE(L)-efficiency:

  • LP-optimal and compound designs have too much replication, and low efficiency for small k.
  • For larger k, less replication is possible and efficiency improves

Notes 2

For k=4:

  • LP-optimal design has only 5 distinct points (the minimum number possible),
  • Compound desing has 8 distinct points
  • MSE(L)-optimal design has 12 distinct points (obviously the maximum)

None of these designs are orthogonal in the main effects, a property of the Plackett-Burman design that is compromised to obtain either

  • PE degrees of freedom (LP);
  • or better robustness from the potential terms (compound and MSE(L)).

Compound and MSE(L)-optimal designs achieve orthogonality between the main effects and two-factor interactions, i.e. A_1 is a zero matrix.

Summary

Conclusions

Two areas of active research:

  • treatment inference;
  • multi-objective design.

Both reduce the reliance on assumptions that may be unrealistic in many cases.

The topics could also be combined. e.g., multi-objective designs could be sought for networked experiments to estimate both direct and indirect treatment effects.

Further work

Both topics also intersect with other research areas in design of experiments.

  • Networked experiments is also an active area within the causal inference community (e.g., Hudgens and Halloran 2008); a workshop was held at King’s in the summer of 2024.

  • Increasingly, experiments are taking place on very large networks, particularly online experimentation e.g., on social media (e.g., Nandy et al. 2020); connections can be made to methods for subsampling large data using design of experiments principles, e.g., Yu, Ai, and Ye (2024).

References

Besag, J., and R. A. Kempton. 1986. “Statistical Analysis of Field Experiments Using Neighbouring Plots.” Biometrics, 231–51.
Bose, M., and A. Dey. 2015. “Crossover Designs.” In Handbook of the Design and Analysis of Experiments, 159–96.
Box, G. E. P., and N. R. Draper. 1959. “A Basis for the Selection of a Response Surface Design.” Journal of the American Statistical Association 54: 622–54.
Cook, R. D., and C. J. Nachtsheim. 1980. “A Comparison of Algorithms for Constructing Exact d-Optimal Designs.” Technometrics, 315–24.
Cox, D. R. 1958. Planning of Experiments. Wiley.
Dean, A., D. Voss, and Draguljić. 2017. Design and Analysis of Experiments. 2nd ed. Springer.
DuMouchel, W., and B. Jones. 1994. “A Simple Bayesian Modification of D-Optimal Designs to Reduce Dependence on an Assumed Model.” Technometrics 36: 37–47.
Egorova, O., and S. G. Gilmour. 2023. “Optimal Response Surface Designs in the Presence of Model Contamination.” arXiv:2208.05366.
Freeman, G. H. 1979. “Some Two-Dimensional Designs Balanced for Nearest Neighbours.” Journal of the Royal Statistical Society B, 88–95.
Gilmour, S. G., and L. A. Trinca. 2012. “Optimum Design of Experiments for Statistical Inference (with Discussion).” Journal of the Royal Statistical Society C 61: 345–401.
Hudgens, M. G., and M. E. Halloran. 2008. “Toward Causal Inference with Interference.” Journal of the American Statistical Association 103: 832–42.
Jones, B., and M. G. Kenwood. 2015. Design and Analysis of Cross-over Trials. Chapman & Hall/CRC Press.
Koutra, V., Olga Egorova, Steven G. Gilmour, and Luzia A. Trinca. 2024. “MOODE: An r Package for Multi-Objective Optimal Design of Experiments.” https://arxiv.org/abs/2412.17158.
Koutra, V., S. G. Gilmour, and B. M. Parker. 2021. “Optimal Block Designs for Experiments on Networks.” Journal of the Royal Statistical Society C, 596–618.
Koutra, V., S. G. Gilmour, B. M. Parker, and A. Mead. 2023. “Design of Agricultural Field Experiments Accounting for Both Complex Blocking Structures and Network Effects.” Journal of Agricultural, Biological and Environmental Statistics, 526–48.
Larsen, N., J. Stallrich, S. Sengupta, A. Deng, R. Kohavi, and N. T. Stevens. 2024. “Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology.” The American Statistician 78: 135–49.
Montepiedra, G., and V. V. Fedorov. 1997. “Minimum Bias Designs with Constraints.” Journal of Statistical Planning and Inference 63: 97–111.
Nandy, P., K. Basu, S. Chatterjee, and Y. Tu. 2020. A/B Testing in Dense Large-Scale Networks: Design and Inference.” In.
Parker, B. M., S. G. Gilmour, and J. Schormans. 2016. “Optimal Design of Experiments on Connected Units with Application to Social Networks.” Journal of the Royal Statistical Society C, 455–80.
Yu, J., M. Ai, and Z. Ye. 2024. “A Review on Design Inspired Subsampling for Big Data.” Statistical Papers 65: 467–510.